Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

changes in trim_silence function regarding the agressive trimming issue #46

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

amoljagadambe
Copy link

@amoljagadambe amoljagadambe commented May 18, 2021

How to use this template

Under each heading below is a short outline of the information required. When submitting a PR, please delete this text and replace it with your own.

The CLA section can be deleted entirely.

Description

update the trimming function using rolling windows to create the mask around the signal
{“fixes #{35}”}

If needed follow up with as much detail as required.

Type of PR

  • Bugfix
  • Feature implementation
  • [ *] Refactor of code (without functional changes)
  • Documentation improvements
  • Test improvements

Documentation

The most important and tweakable part is threshold_value do play around with the value to get your desired trimming

CLA

To protect you, the project, and those who choose to use Mycroft technologies in systems they build, we ask all contributors to sign a Contributor License Agreement.

This agreement clarifies that you are granting a license to the Mycroft Project to freely use your work. Additionally, it establishes that you retain the ownership of your contributed code and intellectual property. As the owner, you are free to use your code in other work, obtain patents, or do anything else you choose with it.

If you haven't already signed the agreement and been added to our public Contributors repo then please head to https://mycroft.ai/cla to initiate the signing process.

@krisgesling krisgesling added the CLA: Yes Contributor License Agreement exists (see https://github.com/MycroftAI/contributors) label May 19, 2021
Copy link
Collaborator

@krisgesling krisgesling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there, thanks for helping out with the project!

This is an interesting approach. Quite different to what I was expecting to address this issue.

The biggest question I have is whether the benefits of this approach outweigh the added complexity of the implementation?

The complexity arises in that we would be adding both numpy and pandas as requirements for this small task. It's also more complex for people wanting to read, understand and work on the code in the future.

Interested to hear your thoughts, and what the benefits that this approach brings over doing something like adding a manual buffer of x seconds to the start and end trims.

Thanks again :)

@@ -1,28 +1,28 @@
"""audio processing, etc"""
from pydub import AudioSegment
import pandas as pd
import numpy as np
import soundfile as sf


class Audio:
silence_threshold = -50.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this would no longer be used anywhere is that right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no it will be used in saving the audio

while sound[trim_ms:trim_ms + Audio.chunk_size].dBFS \
< Audio.silence_threshold and trim_ms < len(sound):
trim_ms += Audio.chunk_size
def _detect_leading_silence(sound: bytearray, sample_rate: int) -> list:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this method should probably be changed to reflect the new nature of it. It is no longer "detecting leading silence" it is generating a mask.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I am working on it



class Audio:
silence_threshold = -50.0
chunk_size = 10
threshold_value = 0.0007 # Tweak the value of threshold to get the aggressive trimming
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this threshold value represent? ie what are the units?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the mask threshold value it will cut the mean value got from rolling windows and discard the value below threshold

Comment on lines 15 to 17
y_mean = y.rolling(window=int(sample_rate / 20),
min_periods=1,
center=True).max()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you break this down for me? What are we getting from this and how do we know that detects silence?


return trim_ms
return [True if mean > Audio.threshold_value else False for mean in y_mean]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result of a comparison is already a Boolean, eg x > y evaluates to either True or False. So this can be simplified to something like:

return [mean > Audio.threshold_value for mean in y_mean]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup I think this is right

trimmed_sound = sound[start_trim:duration - end_trim]
sound, rate = sf.read(path + ".wav")
mask = Audio._detect_leading_silence(sound, rate)
trimmed_sound = sound[mask]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to remove all periods of silence, rather than only silence at the beginning and end of the clip?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will remove all the silence from signal

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be misunderstanding but won't this also remove any apparent "silence" during the speech? Meaning any natural pauses will be cut out and the speech will sound very run together?

I don't think this is the behaviour we are after for TTS recording. Is that your intended use case?

Comment on lines 3 to 5
import pandas as pd
import numpy as np
import soundfile as sf
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally keep these as their full length names. This makes the code throughout more readable as you don't need to return and check what sf refers to for example.

@amoljagadambe
Copy link
Author

I think you above statement is right NumPy and pandas are adding stress into the alpine version of python. meanwhile, I am also working on a different approach which is less heavy and computationally efficient

@amoljagadambe
Copy link
Author

above approach also need some tweak in dockerfile

@amoljagadambe
Copy link
Author

'ffmpeg -i {} -ab 160k -ac 2 -ar 44100 -vn {}.wav -y'.format( webm_file_name, path )

why are we using 2 channels while saving the file

@krisgesling
Copy link
Collaborator

Hey, glad that you're finding the project helpful and able to modify it to fit your use case.

The removal of all silence makes more sense now as it seems you're recording single words rather than whole sentences.

It feels like these changes would be well suited to be configuration options set in the docker-compose.yaml eg:

  • stereo vs mono
  • remove all silence vs trim start and finish

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA: Yes Contributor License Agreement exists (see https://github.com/MycroftAI/contributors)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants